No more pesky learning rates

نویسندگان

  • Tom Schaul
  • Sixin Zhang
  • Yann LeCun
چکیده

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of SGD or other adaptive approaches with their best settings obtained through systematic search, and effectively removes the need for learning rate tuning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The chemokine ESkine/CCL27 displays novel modes of intracrine and paracrine function.

We have previously shown that the beta-chemokine ESkine/CCL27 is differentially spliced to produce two alternative forms. One is a secreted chemokine (ESkine), whereas the other (PESKY) lacks a signal peptide and is translocated to the nucleus. The role of this nuclear-targeted chemokine has not so far been defined, and it was the purpose of this study to examine this chemokine variant in more ...

متن کامل

Resolving Those Pesky X-ray Detector's Dead Time and Pileup Errors

Resolving those Pesky X-ray Detector’s Dead Time and Pileup Errors

متن کامل

No More Pesky Learning Rates: Supplementary Material

If we do gradient descent with η * (t), then almost surely, the algorithm converges (for the quadratic model). To prove that, we follow classical techniques based on Lyapunov stability theory (Bucy, 1965). Notice that the expected loss follows E J θ (t+1) | θ (t) = 1 2 h · E (1 − η * h)(θ (t) − θ *) + η * hσξ 2 + σ 2 = 1 2 h (1 − η * h) 2 (θ (t) − θ *) 2 + (η *) 2 h 2 σ 2 + σ 2 = 1 2 h σ 2 (θ (...

متن کامل

Learning Curve and Industry Structure: Evidences from Iranian Manufacturing Industries

he empirical studies have shown that cost advantages can occur due to economies of scale and economies of learning. However, a few studies have attempted to distinguish between these two effects on reducing costs. This paper is the first attempt on recognizing the impact of learning on reducing the cost with distinguishing the effect of economies of scale in Iran. Therefore, this study aims to ...

متن کامل

The Effects of Social Networks on Nursing Students’ Academic Achievement and Retention in Learning English

Introduction: The use of modern virtual technologies in the process of teaching-learning is inevitable. One example is the use of virtual social networks in education. The purpose of this study was to examine the effects of social networking on nursing students’ academic achievement and retention in learning English. Methods: The pretest-posttest design with a control group was used in this qua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013